New bulk-solvent models improves model-to-data fit and facilitates map interpretation
نویسندگان
چکیده
Bio-macromolecular crystals contain between 10 and 90% of solvent. This solvent is mostly disordered so it cannot be interpreted in terms of an atomic model. Owing to its simplicity and yet relatively good modeling power, flat bulk-solvent model is the most commonly used model to account for disordered solvent in modern crystallographic software packages such as CNS, CCP4 or Phenix. This model assumes electron density is constant anywhere in the unit cell where there is no atomic model placed. While this may be a reasonable approximation for some crystal structures or at initial stages of structure determination, it may be less accurate at final stages. Major deviations from the assumption of a flat model include: 1) local concentration of solvent component of specific types, such as lipid belts in membrane proteins, 2) unmodeled ligands, 3) partial occupancy of solvent in small isolated regions (between or inside macromolecules), and 4) lack of solvent in certain regions (hydrophobic cores). These deviations manifest themselves as elevated R factors in the lowest resolution shells as well as residual features in difference maps: positive in cases when the flat solvent model is inadequate in accounting for distinct features, or negative when solvent model is used in regions with no solvent. To overcome the limitations of the existing bulk solvent model we have proposed a non-uniform bulk-solvent model that allows for solvent variation across the unit cell volume. The new model splits initially binary (0/1) solvent masks into several masks by applying connectivity analysis. These masks are then split further into more masks based on analysis of difference maps. This final set of solvent masks is used to compute the individual bulk solvent contributions to the total model structure factor. Tests on all deposited structures in PDB that have diffraction data and cross-validation flags available indicate systematic improvement of model-to-data fit with no signs of over-fitting as judged by the Rfree factor. Tests on selected models demonstrate notable improvements in map quality especially for weak features such as ligands or solvent molecules. All described tools will be available in Phenix.
منابع مشابه
Analysis of Hierarchical Bayesian Models for Large Space Time Data of the Housing Prices in Tehran
Housing price data is correlated to their location in different neighborhoods and their correlation is type of spatial (location). The price of housing is varius in different months, so they also have a time correlation. Spatio-temporal models are used to analyze this type of the data. An important purpose of reviewing this type of the data is to fit a suitable model for the spatial-temporal an...
متن کاملComparison of two MPSIAC and MMF models in soil erosion mapping of Ardebil Agh Gouni watershed
Extended abstract 1- Introduction Soil erosion is the most important cause of land degradation and the cause of water loss, soil loss, sedimentation in water resources, and maximum flood intensification (Liu et al., 2019). Models are the most important tools for estimating and mapping of erosion at the watershed level. As the experimental models are dependent on used coefficients and region c...
متن کاملRainfall-runoff modelling using artificial neural networks (ANNs): modelling and understanding
In recent years, artificial neural networks (ANNs) have become one of the most promising tools in order to model complex hydrological processes such as the rainfall-runoff process. In many studies, ANNs have demonstrated superior results compared to alternative methods. ANNs are able to map underlying relationship between input and output data without prior understanding of the process under in...
متن کاملDeveloping a Model Based on Geospatial Information Systems (GIS) and Adaptive Neuro-Fuzzy Inference Systems (ANFIS) for Providing the Spatial Distribution Map of Landslide Risk. Case Study: Alborz Province
Landslide is one of these natural hazards which causes a great amount of financial and human damage annually allover the world. Accordingly, identification of areas with landslide threat for implementation of preventive measures in order to confront against the instability of hillsides for reduction of potential threats and related risks is very important. In this research a new method for clas...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل